Skip to content

sink(cloudstorage): add use-table-id-as-path option#4356

Open
yinshuangfei wants to merge 7 commits intopingcap:masterfrom
yinshuangfei:cdc_path_table_id
Open

sink(cloudstorage): add use-table-id-as-path option#4356
yinshuangfei wants to merge 7 commits intopingcap:masterfrom
yinshuangfei:cdc_path_table_id

Conversation

@yinshuangfei
Copy link

@yinshuangfei yinshuangfei commented Mar 5, 2026

What problem does this PR solve?

Issue Number: close #4357

What is changed and how it works?

  • Adds config use-table-id-as-path.

The use-table-id-as-path option switches the path to use table_id instead of table_name when it set to true.
With configuration use-table-id-as-path=true in sink uri, for example: --sink-uri="s3://cdc&use-table-id-as-path=true", the cdc path changed from

test_db/table_name/5/2024-01-01/CDC_xxx.json

to

12345/5/2024-01-01/CDC_xxx.json

In this mode, we adjust cloud storage path generation to omit schema when table-id-as-path is enabled and skip DB schema writes.

  • Adds use_table_id_as_path into v2 API conversion and sink config parsing.
  • Remove partition_id from the path to prevent duplicates.

Check List

Tests

  • Unit test

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • New Features

    • Added optional use-table-id-as-path configuration parameter for cloud storage sink, enabling file organization by numeric table IDs instead of table names.
  • Bug Fixes

    • Enhanced validation for exchange partition operations to properly detect and handle missing source table information.
  • Tests

    • Added test coverage for table ID-based path generation and exchange partition error handling.

Summary by CodeRabbit

  • New Features

    • Optional cloud storage setting to use numeric table IDs in file/directory paths (URI/config/API support).
  • Bug Fixes

    • Stronger validation for exchange-partition DDL events to reject missing/invalid source info.
    • DDL/schema writes now include explicit table identifiers and skip database-level schema files when table-ID-as-path is enabled.
    • Improved error handling around index/data path generation.
  • Tests

    • Added tests for table-ID paths, skipping DB schemas, and invalid exchange-partition events.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 5, 2026
@pingcap-cla-assistant
Copy link

pingcap-cla-assistant bot commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 5, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the cloud storage sink by introducing a new configuration option, use-table-id-as-path. This option provides users with the flexibility to specify whether the output directory structure for table-related data should utilize the table's unique ID instead of its name, which can be beneficial for data management and consistency. The changes involve updating the sink's file writing logic and path generation to respect this new setting, along with corresponding test coverage. Additionally, a critical fix for DDL event processing ensures that exchange partition operations are handled correctly by validating the presence of source table information.

Highlights

  • New Configuration Option: Introduced use-table-id-as-path for the cloud storage sink, allowing output paths to use table IDs instead of table names for better organization and uniqueness.
  • Enhanced Path Generation Logic: Updated internal file path generation functions to conditionally use the table's unique ID based on the new configuration setting.
  • Improved DDL Event Handling: Added validation for ActionExchangeTablePartition DDL events to ensure that source table information is always present, preventing potential errors.
  • Expanded Test Coverage: Included new unit tests to verify the functionality of table ID-based paths and the robustness of the DDL event validation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • downstreamadapter/sink/cloudstorage/sink.go
    • Validated ActionExchangeTablePartition DDL events for missing source table info.
    • Modified writeFile function to accept tableID and updated its calls.
  • downstreamadapter/sink/cloudstorage/sink_test.go
    • Added test for use-table-id-as-path option in DDL event writing.
    • Included test for invalid ActionExchangeTablePartition DDL events.
  • pkg/sink/cloudstorage/config.go
    • Added UseTableIDAsPath field to urlConfig and Config structs.
    • Implemented parsing of use-table-id-as-path from sink URI.
  • pkg/sink/cloudstorage/config_test.go
    • Updated tests to include use-table-id-as-path in URI parsing and config application.
  • pkg/sink/cloudstorage/path.go
    • Introduced generateTablePath to conditionally use table name or ID.
    • Updated CheckOrWriteSchema and generateDataDirPath to use the new path generation logic.
  • pkg/sink/cloudstorage/path_test.go
    • Added test for data file path generation using table ID.
  • pkg/sink/cloudstorage/table_definition.go
    • Modified GenerateSchemaFilePath to accept useTableIDAsPath and tableID parameters.
    • Updated schema file path generation to use table ID when configured.
  • pkg/sink/cloudstorage/table_definition_test.go
    • Updated tests for GenerateSchemaFilePath to cover the useTableIDAsPath option.
Activity
  • Implemented a new configuration option for cloud storage sink path generation.
  • Refactored DDL event handling to include robust validation for exchange partition operations.
  • Added comprehensive unit tests to ensure the correctness of the new feature and the DDL event fix.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29448a0b-8b0c-4950-9963-71a751107e2a

📥 Commits

Reviewing files that changed from the base of the PR and between 1e51de0 and 095fb1b.

📒 Files selected for processing (1)
  • pkg/sink/cloudstorage/path.go

📝 Walkthrough

Walkthrough

Adds a use-table-id-as-path option across API, config, and cloudstorage sink; path generation and schema/index/data file APIs now accept tableID and return errors; sink writes schema files using explicit tableIDs (exchange requires source TableInfo); tests updated and new validations added.

Changes

Cohort / File(s) Summary
Configuration & API
pkg/sink/cloudstorage/config.go, pkg/sink/cloudstorage/config_test.go, api/v2/model.go, api/v2/model_test.go
Add UseTableIDAsPath flag to Config and API types, bind use-table-id-as-path URI param, merge/apply value, and add tests for parsing and API round-trip.
Path & Table-definition logic
pkg/sink/cloudstorage/path.go, pkg/sink/cloudstorage/path_test.go, pkg/sink/cloudstorage/table_definition.go, pkg/sink/cloudstorage/table_definition_test.go
Change path APIs to accept useTableIDAsPath/tableID, add generateTablePath, switch to error returns instead of panics, validate schema/version/tableID, and update tests for table-ID-based paths.
Sink implementation & tests
downstreamadapter/sink/cloudstorage/sink.go, downstreamadapter/sink/cloudstorage/sink_test.go
Require source TableInfo for exchange-partition DDL, write target and source table defs with explicit tableIDs, update writeFile signature to accept tableID, adjust path-generation calls, and add/extend tests (including invalid exchange cases and table-id-as-path scenarios).
Writer changes
downstreamadapter/sink/cloudstorage/writer.go
Capture and handle errors from GenerateIndexFilePath during flush, log contextual info, and return on failure instead of proceeding with invalid paths.
Config plumbing & CLI
pkg/config/sink.go, pkg/config/changefeed.go
Add use-table-id-as-path SinkURI key and SinkConfig field, parse and propagate the param, include in compatibility checks, and clear field in rmStorageOnlyFields for non-storage sinks.

Sequence Diagram(s)

sequenceDiagram
    participant CDC as CDC/EventProducer
    participant Sink as CloudStorageSink
    participant Path as PathGenerator
    participant Writer as StorageWriter

    CDC->>Sink: Emit DDL event (includes tableID / exchange info)
    Sink->>Sink: Validate event (ensure exchange source TableInfo present)
    Sink->>Path: GenerateSchemaFilePath(useTableIDAsPath, tableID)
    Path-->>Sink: schemaFilePath or error
    Sink->>Writer: GenerateIndexFilePath / GenerateDataFilePath
    Writer-->>Sink: index/data path or error
    Sink->>Writer: Write schema/index/data files
    Writer-->>CDC: Ack or error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

lgtm, approved

Suggested reviewers

  • wk989898
  • hongyunyan

Poem

🐰 I hopped through configs with a tiny cheer,

Swapped names for numbers so paths are clear,
I checked each schema and guarded every test,
Tables now find home by ID, not guessed,
🥕 A little rabbit's tidy file-path fest.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: adding a new use-table-id-as-path option to cloud storage sink configuration.
Description check ✅ Passed The PR description addresses the main problem, explains how the feature works with clear examples, lists test coverage, and includes issue reference (close #4357).
Linked Issues check ✅ Passed All key requirements from issue #4357 are met: configuration switch added, path omits schema when enabled, DB schema writes skipped, and unit tests included.
Out of Scope Changes check ✅ Passed All changes directly support the use-table-id-as-path feature: configuration parsing, path generation logic, validation for exchange partitions, and related test coverage are all in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a use-table-id-as-path option for the cloud storage sink, enabling paths to be constructed using table IDs instead of table names. While the implementation is well-tested and appears correct, a security audit identified that the use of log.Panic for handling invalid or unexpected input from external sources (DDL events or storage files) could lead to Denial of Service (DoS). It is recommended to replace panics with proper error handling to ensure the stability of the replication process. Additionally, there are minor refactoring opportunities to improve code reuse and test structure.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@downstreamadapter/sink/cloudstorage/sink_test.go`:
- Around line 289-293: The test is missing setup of the mock PD clock before
creating the sink; call appcontext.SetService to register the mock PD clock
service (same mock used in other tests) prior to invoking newSinkForTest so the
sink and any PD clock accessors see the mock; specifically, in the test where
ctx, cancel := context.WithCancel(context.Background()) and cloudStorageSink,
err := newSinkForTest(...) are created, insert the mock PD clock registration
(via appcontext.SetService(mockPDClock)) using the same mock instance used by
TestWriteDDLEvent/TestWriteDDLEventWithTableIDAsPath and ensure it is cleaned up
or reset after the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef1e6151-8c9d-4aad-8422-14f382fcb166

📥 Commits

Reviewing files that changed from the base of the PR and between 039417c and 74c09c3.

📒 Files selected for processing (8)
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/config_test.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition.go`:
- Around line 334-343: The GenerateSchemaFilePath function allows
useTableIDAsPath to produce a "0" path when tableID is zero; add a guard at the
top of GenerateSchemaFilePath to validate tableID (e.g., tableID > 0) when
useTableIDAsPath is true and return a clear error if invalid, before calling
generateTablePath or computing checksum, so callers fail fast instead of writing
into an ambiguous "0" directory; reference: GenerateSchemaFilePath,
useTableIDAsPath, tableID, and generateTablePath.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 402c67b5-c910-49f1-b230-ddc3b6062f96

📥 Commits

Reviewing files that changed from the base of the PR and between 74c09c3 and 9326b6d.

📒 Files selected for processing (2)
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/sink/cloudstorage/table_definition.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition_test.go`:
- Around line 524-529: The test assertions in table_definition_test.go expect
the old error text "tableID must be positive" but GenerateSchemaFilePath now
returns "invalid table id for table-id path"; update the two assertions that
call GenerateSchemaFilePath (the cases with tableID 0 and -1) to assert that
err.Error() contains "invalid table id for table-id path" instead of the old
string so the test matches the current implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 213ab268-deb6-4709-a28f-0f9b8b979e5f

📥 Commits

Reviewing files that changed from the base of the PR and between 7296adb and 1e5ab99.

📒 Files selected for processing (2)
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/sink/cloudstorage/table_definition_test.go (1)

522-529: Add one test that documents intended behavior when tableID is unset.

These assertions currently codify hard-fail on tableID == 0/-1. Please add a companion case that captures the intended production behavior for missing IDs (fallback vs fail-fast), so the contract stays explicit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/table_definition_test.go` around lines 522 - 529, Add a
companion unit test for the missing/unset tableID case that calls
def.GenerateSchemaFilePath(false, 0) (i.e., use-table-id-as-path = false with
tableID == 0) and assert the function returns no error and a schema path that
uses the table name/default fallback rather than failing; this makes the
contract explicit alongside the existing failure assertions for
GenerateSchemaFilePath(true, 0) and GenerateSchemaFilePath(true, -1).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition.go`:
- Around line 355-360: The check that hard-fails when tableID <= 0 inside the
useTableIDAsPath branch should be changed to gracefully fall back to using the
table name path instead of returning
errors.ErrInternalCheckFailed.GenWithStackByArgs; update the logic around the
useTableIDAsPath && t.Table branch (referencing useTableIDAsPath, t.Table,
tableID, and the current error return) so that if tableID is unset/<=0 you skip
the error and proceed to generate a path based on t.Table (or the existing
name-based path logic) rather than aborting.

---

Nitpick comments:
In `@pkg/sink/cloudstorage/table_definition_test.go`:
- Around line 522-529: Add a companion unit test for the missing/unset tableID
case that calls def.GenerateSchemaFilePath(false, 0) (i.e., use-table-id-as-path
= false with tableID == 0) and assert the function returns no error and a schema
path that uses the table name/default fallback rather than failing; this makes
the contract explicit alongside the existing failure assertions for
GenerateSchemaFilePath(true, 0) and GenerateSchemaFilePath(true, -1).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 77b845f5-35c1-4489-aa13-e29feadc64c4

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5ab99 and 3302635.

📒 Files selected for processing (3)
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

@tenfyzhong
Copy link
Collaborator

/check-issue-triage-complete

@yinshuangfei yinshuangfei changed the title sink(cloudstorage): add use-table-id-as-path option DNM: sink(cloudstorage): add use-table-id-as-path option Mar 5, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yinshuangfei
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@yinshuangfei yinshuangfei changed the title DNM: sink(cloudstorage): add use-table-id-as-path option sink(cloudstorage): add use-table-id-as-path option Mar 6, 2026
@yinshuangfei
Copy link
Author

@coderabbitai resume

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sink/cloudstorage/path.go (1)

398-418: ⚠️ Potential issue | 🟠 Major

Avoid duplicating the physical table ID in partition paths.

With both UseTableIDAsPath and EnablePartitionSeparator enabled, Line 400-403 and Line 417 both append tbl.TableNameWithPhysicTableID.TableID. For partitioned tables this generates <physicalTableID>/<version>/<physicalTableID>/..., so the extra segment stops carrying distinct partition information and changes the path layout unexpectedly.

💡 One safe direction if physical table ID is already the top-level key
-	if f.config.EnablePartitionSeparator && tbl.TableNameWithPhysicTableID.IsPartition {
+	if f.config.EnablePartitionSeparator &&
+		tbl.TableNameWithPhysicTableID.IsPartition &&
+		!f.config.UseTableIDAsPath {
 		elems = append(elems, fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path.go` around lines 398 - 418, The code appends the
physical TableID twice when both f.config.UseTableIDAsPath and
f.config.EnablePartitionSeparator are true; update the partition-separator block
so it only appends the extra TableID when it is not already included by
generateTablePath. Concretely, in the branch that currently appends
fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID) under
f.config.EnablePartitionSeparator, add a guard to skip this append when
f.config.UseTableIDAsPath is true (or when generateTablePath was called with the
TableID-included flag), referencing f.config.UseTableIDAsPath,
f.config.EnablePartitionSeparator, tbl.TableNameWithPhysicTableID.TableID,
generateTablePath, and f.versionMap[tbl].
🧹 Nitpick comments (1)
pkg/sink/cloudstorage/path.go (1)

218-232: Remove the second invalid-table-ID check.

Line 218 already routes through def.GenerateSchemaFilePath(...), which is the canonical validation point for use-table-id-as-path. The extra branch at Line 230-232 is dead on this path and duplicates the error literal, so the two sites can drift again.

♻️ Suggested cleanup
 	tblSchemaFile, err := def.GenerateSchemaFilePath(f.config.UseTableIDAsPath, table.TableNameWithPhysicTableID.TableID)
 	if err != nil {
 		return false, err
 	}
 	exist, err := f.storage.FileExists(ctx, tblSchemaFile)
 	if err != nil {
 		return false, err
 	}
 	if exist {
 		f.versionMap[table] = table.TableInfoVersion
 		return false, nil
 	}
-	if f.config.UseTableIDAsPath && table.TableNameWithPhysicTableID.TableID <= 0 {
-		return false, errors.ErrInternalCheckFailed.GenWithStackByArgs("invalid table id for table-id path")
-	}

Based on learnings: GenerateSchemaFilePath(useTableIDAsPath bool, tableID int64) intentionally returns ErrInternalCheckFailed when tableID <= 0 and useTableIDAsPath=true.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path.go` around lines 218 - 232, The code duplicates
validation for table IDs: GenerateSchemaFilePath(f.config.UseTableIDAsPath,
table.TableNameWithPhysicTableID.TableID) already returns ErrInternalCheckFailed
when UseTableIDAsPath is true and tableID <= 0, so remove the redundant
conditional that checks f.config.UseTableIDAsPath &&
table.TableNameWithPhysicTableID.TableID <= 0 (the branch after exist check that
returns ErrInternalCheckFailed) to avoid dead/duplicated validation; keep the
call to def.GenerateSchemaFilePath, the FileExists check (f.storage.FileExists),
and the versionMap assignment (f.versionMap[table] = table.TableInfoVersion)
untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@downstreamadapter/sink/cloudstorage/sink.go`:
- Around line 227-234: The early-return in sink.writeFile incorrectly uses
!def.IsTableSchema(), causing invalid table-level DDLs to be silently skipped in
UseTableIDAsPath mode; change the condition to explicitly check for a true
database-level DDL by using def.Table == "" (and keep the surrounding
cfg.UseTableIDAsPath check) so only genuine schema-level events are skipped and
invalid table defs will proceed to GenerateSchemaFilePath and surface validation
errors; reference symbols: sink.writeFile, cfg.UseTableIDAsPath, def.Table,
def.IsTableSchema(), GenerateSchemaFilePath.

In `@pkg/config/sink.go`:
- Around line 954-965: The compatibility check must include the new
UseTableIDAsPath flag so sink-URI-only changes are detected: update
CheckCompatibilityWithSinkURI to parse UseTableIDAsPath from params (as done in
the shown block) and compare it against the current s.UseTableIDAsPath value; if
they differ, populate cfgInSinkURI and cfgInFile for UseTableIDAsPath (similar
to Protocol and TxnAtomicity) and do not let the function return early ignoring
this difference. Ensure you reference the same symbols (UseTableIDAsPath,
CheckCompatibilityWithSinkURI, cfgInSinkURI, cfgInFile, params.Get) when adding
this comparison so URI-driven toggles are correctly treated as
compatibility-breaking.

---

Outside diff comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 398-418: The code appends the physical TableID twice when both
f.config.UseTableIDAsPath and f.config.EnablePartitionSeparator are true; update
the partition-separator block so it only appends the extra TableID when it is
not already included by generateTablePath. Concretely, in the branch that
currently appends fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID)
under f.config.EnablePartitionSeparator, add a guard to skip this append when
f.config.UseTableIDAsPath is true (or when generateTablePath was called with the
TableID-included flag), referencing f.config.UseTableIDAsPath,
f.config.EnablePartitionSeparator, tbl.TableNameWithPhysicTableID.TableID,
generateTablePath, and f.versionMap[tbl].

---

Nitpick comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 218-232: The code duplicates validation for table IDs:
GenerateSchemaFilePath(f.config.UseTableIDAsPath,
table.TableNameWithPhysicTableID.TableID) already returns ErrInternalCheckFailed
when UseTableIDAsPath is true and tableID <= 0, so remove the redundant
conditional that checks f.config.UseTableIDAsPath &&
table.TableNameWithPhysicTableID.TableID <= 0 (the branch after exist check that
returns ErrInternalCheckFailed) to avoid dead/duplicated validation; keep the
call to def.GenerateSchemaFilePath, the FileExists check (f.storage.FileExists),
and the versionMap assignment (f.versionMap[table] = table.TableInfoVersion)
untouched.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a51876cc-08fb-4742-acf1-cc189eb723ed

📥 Commits

Reviewing files that changed from the base of the PR and between 3302635 and 2cc7aa9.

📒 Files selected for processing (11)
  • api/v2/model.go
  • api/v2/model_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/config/changefeed.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/sink/cloudstorage/path_test.go

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

✅ Actions performed

Reviews resumed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/config/sink.go (1)

954-965: ⚠️ Potential issue | 🟠 Major

Add UseTableIDAsPath to the compatibility delta check.

This block makes use-table-id-as-path URI-governed, but CheckCompatibilityWithSinkURI still only keys off Protocol and TxnAtomicity. A changefeed update that only toggles UseTableIDAsPath can therefore return early and miss a real URI/config conflict.

🐛 Proposed fix
-	cfgParamsChanged := s.Protocol != oldSinkConfig.Protocol ||
-		s.TxnAtomicity != oldSinkConfig.TxnAtomicity
+	cfgParamsChanged := s.Protocol != oldSinkConfig.Protocol ||
+		s.TxnAtomicity != oldSinkConfig.TxnAtomicity ||
+		s.UseTableIDAsPath != oldSinkConfig.UseTableIDAsPath

Also applies to: 997-1014

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/sink.go` around lines 954 - 965, The change makes UseTableIDAsPath
(s.UseTableIDAsPath) driven from the sink URI but CheckCompatibilityWithSinkURI
still ignores that key, so updates toggling UseTableIDAsPath can bypass
compatibility checks; update CheckCompatibilityWithSinkURI to include
UseTableIDAsPath in the compatibility delta comparison (same style as
Protocol/TxnAtomicity), using the same config key string (UseTableIDAsPathKey)
and comparing util.GetOrZero(s.UseTableIDAsPath) with the parsed/enabled value
from cfgInSinkURI/cfgInFile, and ensure cfgInSinkURI and cfgInFile are populated
consistently where UseTableIDAsPath is parsed so the function can detect and
return conflicts correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 127-131: The generateTablePath function currently returns any
tableID as a string which allows useTableIDAsPath=true to produce "0/..." paths;
change the logic to reject non-positive IDs the same way GenerateSchemaFilePath
does: when useTableIDAsPath is true and tableID <= 0 return an error
(ErrInternalCheckFailed) instead of silently returning "0", and update callers
that build data/index directories (the code paths referenced around the
data/index generation using generateTablePath) to handle/propagate that error so
data/index path construction fails early rather than creating mixed ID/name
paths.

---

Duplicate comments:
In `@pkg/config/sink.go`:
- Around line 954-965: The change makes UseTableIDAsPath (s.UseTableIDAsPath)
driven from the sink URI but CheckCompatibilityWithSinkURI still ignores that
key, so updates toggling UseTableIDAsPath can bypass compatibility checks;
update CheckCompatibilityWithSinkURI to include UseTableIDAsPath in the
compatibility delta comparison (same style as Protocol/TxnAtomicity), using the
same config key string (UseTableIDAsPathKey) and comparing
util.GetOrZero(s.UseTableIDAsPath) with the parsed/enabled value from
cfgInSinkURI/cfgInFile, and ensure cfgInSinkURI and cfgInFile are populated
consistently where UseTableIDAsPath is parsed so the function can detect and
return conflicts correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4a1808c-9d41-4b3a-904b-92ae604bef64

📥 Commits

Reviewing files that changed from the base of the PR and between 3302635 and 2cc7aa9.

📒 Files selected for processing (11)
  • api/v2/model.go
  • api/v2/model_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/config/changefeed.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

Comment on lines +137 to +139
if urlParameter.UseTableIDAsPath != nil {
c.UseTableIDAsPath = *urlParameter.UseTableIDAsPath
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if urlParameter.UseTableIDAsPath != nil {
c.UseTableIDAsPath = *urlParameter.UseTableIDAsPath
}
c.UseTableIDAsPath = util.GetOrZero(urlParameter.UseTableIDAsPath)

Use the util.GetOrZero function to keep the code style

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimized.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/sink/cloudstorage/path_test.go (1)

141-166: Add the partitioned-table case here too.

This only locks in the plain-table path shape. The branch most likely to regress is UseTableIDAsPath=true with EnablePartitionSeparator=true and IsPartition=true, where the partition ID must stay omitted so we don't reintroduce tableID/.../<partitionID>/... duplicates.

💡 Suggested test extension
 func TestGenerateDataFilePathWithTableIDAsPath(t *testing.T) {
 	t.Parallel()

 	ctx, cancel := context.WithCancel(context.TODO())
 	defer cancel()
@@
 	date := f.GenerateDateStr()
 	path, err := f.GenerateDataFilePath(ctx, table, date)
 	require.NoError(t, err)
 	require.Equal(t, fmt.Sprintf("12345/5/CDC_%s_000001.json", table.DispatcherID.String()), path)
+
+	partitioned := table
+	partitioned.TableNameWithPhysicTableID.IsPartition = true
+	f = testFilePathGenerator(ctx, t, dir)
+	f.config.UseTableIDAsPath = true
+	f.config.EnablePartitionSeparator = true
+	f.versionMap[partitioned] = partitioned.TableInfoVersion
+
+	path, err = f.GenerateDataFilePath(ctx, partitioned, date)
+	require.NoError(t, err)
+	require.Equal(t, fmt.Sprintf("12345/5/CDC_%s_000001.json", partitioned.DispatcherID.String()), path)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path_test.go` around lines 141 - 166, Extend
TestGenerateDataFilePathWithTableIDAsPath to also cover the partitioned-table
case: create a VersionedTableName with IsPartition=true and a non-empty
PartitionID, set f.config.UseTableIDAsPath=true and
f.config.EnablePartitionSeparator=true, populate f.versionMap for that table
version, call f.GenerateDataFilePath(ctx, table, date) and assert no error and
that the resulting path omits the partition ID (matches
"12345/5/CDC_<dispatcher>_000001.json"). This ensures GenerateDataFilePath
respects UseTableIDAsPath + EnablePartitionSeparator when IsPartition is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 405-440: generateDataDirPath currently reads f.versionMap[tbl]
directly which yields a zero value (0) when the map hasn't been primed; change
FilePathGenerator.generateDataDirPath to check for presence using the comma-ok
idiom (e.g., v, ok := f.versionMap[tbl]) and return a descriptive error if !ok
(mentioning tbl or that CheckOrWriteSchema must run) so callers of
GenerateDataFilePath / GenerateIndexFilePath fail fast rather than producing a
"0" path segment; keep the rest of the path-building logic unchanged and
reference CheckOrWriteSchema in the error message as the priming step.

---

Nitpick comments:
In `@pkg/sink/cloudstorage/path_test.go`:
- Around line 141-166: Extend TestGenerateDataFilePathWithTableIDAsPath to also
cover the partitioned-table case: create a VersionedTableName with
IsPartition=true and a non-empty PartitionID, set f.config.UseTableIDAsPath=true
and f.config.EnablePartitionSeparator=true, populate f.versionMap for that table
version, call f.GenerateDataFilePath(ctx, table, date) and assert no error and
that the resulting path omits the partition ID (matches
"12345/5/CDC_<dispatcher>_000001.json"). This ensures GenerateDataFilePath
respects UseTableIDAsPath + EnablePartitionSeparator when IsPartition is true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 769312cb-ecc2-4a88-8268-0dc98f576431

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc7aa9 and 1e51de0.

📒 Files selected for processing (7)
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/writer.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/config/sink.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use table_id instead of table_name as cdc path.

2 participants